Beyond Hebb: Exclusive-OR and Biological Learning* 
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A learning algorithm for multilayer neural networks based on biologically plausible mechanisms is 
studied. Motivated by findings in experimental neurobiology, we consider synaptic averaging in the 
induction of plasticity changes, which happen on a slower time scale than firing dynamics. This 
mechanism is shown to enable learning of the exclusive-OR (XOR) problem without the aid of error 
backpropagation, as well as to increase robustness of learning in the presence of noise. 

PACS numbers: 87.17.Aa, 82.20. Wt, 87.19.La 



Since the early days of neurophysiology we have evi- 
dence of biological mechanisms serving as the basis for 
learning and information processing in the brain. Ca- 
jal's pictures showing networks of intertwined nerve cells 
readily lead to the hypothesis of information flow and 
processing in these networks Subsequently formu- 
lated theoretical models of the neuron, as by McCulloch 
and Pitts , and the Hebbian learning rule, postulating 
synaptic strengthening for simultaneous pre- and postsy- 
naptic activity ^ , sparked the development of algorithms 
for neuronal learning and memory. The development of 
learning algorithms, however, took place almost decou- 
pled from biological validation, partly due to lack of de- 
tailed knowledge of the neurophysiology of learning, but 
also due to their success in applied fields ("connection- 
ism" , "machine learning" ) . Among the first models were 
layered assemblies of formal neurons (Perceptrons) com- 
bined with gradient rules defining the synaptic weights 
j| . Later, combining Hebb's strictly local rule with sym- 
metrically connected formal neurons defined the Hopfield 
model of simple associative learning J5j. However, only a 
complicated non-local learning rule, now known as error 
backpropagation, finally was able to solve simple non- 
linear learning problems as the learning of the exclusive- 
or (XOR) function || . This complicated form of reverse 
information transfer, however, has not been observed in 
biological circuits Q . 

For computation in biological nervous systems the 
question remains, which underlying biological processes 
are capable of the most general form of learning || , in- 
cluding problems of the XOR class. A more biologically 
plausible learning concept is learning by reinforcement 
and recently a number of models along this line have been 
formulated |],[nj. One such model by Barto and Anan- 
dan combines a local mechanism of synaptic plasticity 
changes with a global feedback signal indicating infor- 
mation worth memorizing fll| , |l2|| . A remaining problem 
in these models is the regulation of mean activity level in 
large networks which has been attacked by Alstr0m and 
Stassinopoulos jl3| and Stassinopoulos and Bak |p^ , |l5| . 
An even more elegant mechanism has been proposed by 
Chialvo and Bak with reinforcement through nega- 



tive feedback which is motivated by the observed long- 
term depression (LTD) in biological networks. In this 
algorithm, the dynamics of synaptic plasticity comes to 
a halt when learning reaches its goal, just by definition. 
While we think that this is a very interesting approach 
to formulating a biologically plausible learning mecha- 
nism, this model suffers a severe restriction in learning 
capabilities. It has been shown to work well on simple 
tasks as non-overlapping pattern sets, however, it is not 
able to learn tasks as the XOR function, at least not 
without unreasonably large numbers of neurons and very 
long learning times. It is, therefore, nearly as limited 
as the early single layer perceptron models that, for this 
reason, nearly paralyzed the research in neural networks 
in the seventies (mainly following the sobering analysis 
of perceptron capabilities by Minsky and Papert |l7| , |l8| ] ) . 

In the following we will study a model in this spirit 
which, however, does not exhibit this restriction. Let us 
first define the model, then report numerical results on 
its learning capabilities. We will then discuss the robust- 
ness of our model in the presence of noise. The letter con- 
cludes with a discussion of the motivation of our model 
from current findings in neurobiology. 

Consider a layered network of binary formal neu- 
rons Xi G {0,1}, with / input sites ieo, • • • 
J hidden sites xi, . . . , and K output units 

xj + j, . . . , xi+j+k-x- The adjacent layers are completely 
connected by weights Wji from each input to each hidden 
unit and from each hidden unit to each output unit. In 
addition, each weight is assigned an internal degree of 
freedom, acknowledging the finite time scale of synap- 
tic plasticity induction as will be discussed below. In 
the model this is represented by an additional discrete 
variable Cji associated to each weight uiji serving as a 
synaptic memory during learning. 

The network dynamics is defined by the following 
steps. The input sites are activated with external stim- 
uli xq,...,Xi-\. Each hidden node j then receives a 
weighted input hj = ^ i={ J WjiXi. Its state is chosen 
according to a probabilistic rule s.t. each hidden neu- 



ron fires with probability pj 



exp(phj) with the 



normalization a = ^\ ■ exp((3hj). We consider the low 
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activity limit of the network where only one hidden neu- 
ron fires at a time. Each output neuron k now receives 
an input sum hk = 1 w kjXj with the only non- 

zero contribution from the firing hidden neuron j* such 
that hk = Wkj* ■ The above probabilistic rule applies to 
the output layer as well, determining one firing output 
neuron Xk* which represents the output of the network 
corresponding to a given input pattern. Note that in 
the low activity limit used here, the probabilistic rule 
is a stochastic approximation of the winner-take-all rule 
p9| . We think our variant based on local dynamics is 
biologically more realistic than supplying global informa- 
tion of which neuron has the highest input sum within a 
layer. In the limit (3 — ► oo, the neuronal activity in our 
model follows exact winner-take-all dynamics, since then 
rn.sx.jPj = a -1 exp(/3maxj hj) — ► 1. This deterministic 
case has been used in the network model of Chialvo and 
Bak Here, however, we consider stochastic models 
with finite values of (3. 

Now it remains to specify the learning dynamics of 
the network weights Wji themselves. For each activation 
pattern, the network output is compared to the target 
output and a feedback signal r returned to the network, 
with r = +1, if its output neurons represent the prede- 
fined target output, given the current input, and r = — 1 
otherwise. Depending on this binary feedback, connec- 
tions and corresponding counter values are updated. All 
"active" synapses w (and corresponding counter values 
c) for which pre- and postsynaptic sites have been simul- 
taneously active are updated as follows. The feedback 
signal is subtracted from the memory c of each active 
synapse according to: 

f 9, if c-r > o (*) 

c^c'= c-r, if 9> c-r >0 (1) 
[ 0, if > c - r. 

Thus, each counter c is an error account of the corre- 
sponding synapse. The capacity of the account is given 
by the memory size O. In case this threshold is exceeded 
[marked by (*) in equation ([!])] the synapse is penalized, 
i. e., it is weakened by a constant amount 5: 

w — > w — w — S. (2) 

(Alternatively, a multiplicative penalty combined with a 
constant growth of weights has been successfully checked, 
too.) Therefore, the counter averages over the record of 
a synapse, instead of penalizing each single error at the 
moment it occurs. Note that the model by Chialvo and 
Bak |l6| is just this latter case and is obtained by set- 
ting = and /3 = oo. After these changes to weights 
and counters the learning cycle is iterated by present- 
ing another — possibly different — pattern of stimuli to the 
network. 

Note that (3 and 6 are not independent parameters; 
changing the value of 5 does not affect the dynamics, as 



long as the product (38 is kept constant and the weights 
are rescaled correspondingly. Furthermore, the firing 
probabilities are conserved under transformations that 
add the same value to all outgoing connections of one 
neuron. We could therefore keep the values of the weights 
in a bounded domain without changing the model dy- 
namics. 

Let us next demonstrate the learning capability and 
robustness of the model by simulating an XOR learning 
task. The network used has 1 = 3 input sites xq, x\, and 
X2, with the input site xq = 1 serving as bias. The hidden 
layer has J = 3 neurons, the minimum number necessary 
to represent the XOR function in the present architec- 
ture. K — 2 output neurons represent the two possible 
outcomes with only one of them active at a time. The ini- 
tial values of the weights w are uniformly chosen random 
numbers G [0, 1], all counters c are set to 0, and 5=1. 
The four patterns of the XOR function are presented with 
equal probability. Fig. [I] shows learning curves for mem- 
ory sizes = 0,1,2 with (3 = 10 and averaged over 10000 
independent runs each. 
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FIG. 1. Learning curves show the effect of the internal 
synaptic memory under weak noise, (/3 = 10). In the case of 
synapses without internal memory (0 = 0) the error remains 
close to 0.5, practically no adaptation to the desired function 
(XOR) takes place. However, networks with one-step memory 
synapses (O = 1) quickly reduce the residual error, indicat- 
ing a fast adaption process. Increasing the memory length 
(O = 2) leads to even more efficient learning. Each learning 
curve is an ensemble average of 10000 independent runs. 

The displayed error E is the fraction of simulation runs 
that have produced an incorrect output at the considered 
time step. We find that learning takes place with > 1 
only, where the error quickly converges to zero, whereas 
with = as in the model of Chialvo and Bak |ll| no 
learning takes place at all. The error remains constant 
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hardly below the "default" of 0.5 (this holds for the whole 
simulation time of 100,000 trials, not shown here). 

The obviously dramatic effect of the synaptic memory 
can be understood in the following way: Any synapse 
that is involved in failure — meaning that pre- and post- 
synaptic firings have occured prior to unsuccessful output 
of the network — is a candidate for decrement. In the case 
O = all such "failing" synapses are weakened, such that 
on repeated presentation of the same stimuli the activity 
is likely to be lead to a different output neuron. This is 
a simple and reasonable principle as long as our learn- 
ing goal is the mapping of just one pattern of stimuli or 
a set of non-overlapping patterns. However, the task of 
learning a non-trival logical operation as the one we are 
facing here, requires a more elaborate mechanism: The 
immediate weakening of all synapses, that are involved in 
failure for a certain pattern, eventually destroys a useful 
structure for the successful mapping of other patterns. 
This is avoided by the synaptic memory considered here: 
Only if a synapse is repeatedly involved in failure, its 
efficacy is reduced. 

The idea of averaging over errors and updating the 
weights on a slower time scale than sample presenta- 
tion is well known from batch learning methods [ pp[ . In 
those methods, errors are determined over a whole sweep 
through the pattern set and subsequently weights are up- 
dated synchronously. However, those algorithms fail to 
explain learning in biological neural systems as they rely 
on biologically implausible mechanisms as, for example, 
back-propagating errors. In fact, what we wish to define 
here is a learning method based on purely local dynam- 
ics, where weight changes are based only on information 
that is locally available (the two adjacent neurons of a 
synapse) with nothing more than a single global rein- 
forcement signal — exactly the information that is avail- 
able to a biological synapse. A first step in this direction 
would be a trivial "localized" version of batch learning 
where weight changes are based on the global reinforce- 
ment signal, only. Indeed, this works for single layer 
networks, however, fails for learning XOR-type problems 
in multi-layer networks. Here, our work proposes a solu- 
tion, using a synaptic error account combined with asyn- 
chronous updating of the synaptic weights. It can be 
viewed as a generalization of the Hebbian learning rule: 
While the Hebb rule alone is not able to make a network 
learn the XOR, the above extension does so. The result- 
ing network is a self-contained dynamical system with 
local dynamical rules defined in a way that the overall 
network dynamics results in adaptive learning of general 
logical functions including the XOR problem. Besides 
learning XOR as shown here, the algorithm also proved 
to learn logical functions of higher dimensions and com- 
plexity. 

The aspect of protecting synapses from too quick 
changes has further implications with regard to the net- 
work's robustness against noise. Fig. [2] demonstrates the 



effect of the inverse noise level /3 on the mean residual 
error after 90,000 trials. 




FIG. 2. Longer synaptic memory allows for learning in 
noisier networks. The critical value of /3 for the transition 
between the non-learning (high error) and the learning (low 
error) regime decreases with the memory length B with larger 
memory meaning higher robustness against noise. On the 
right vertical axis the residual errors for deterministic net- 
works (/3 = oo) are shown. Without synaptic memory (0 = 0, 
diamond; this corresponds to the model of Chialvo and Bak, 
see text) we still find a high error, otherwise (O = 1,2,3, 
large square) complete learning is achieved. Displayed errors 
are averages over time steps 90,000 to 100,000 of 100 inde- 
pendent simulation runs. 

For fixed memory length O we find a sharp transition 
from a regime of non-learning, characterized by E = 0.5, 
to a regime of effective learning with E — > 0. We conclude 
that the network is able to learn just as long as the in- 
formation gain provided by the feedback signal is larger 
than the information loss caused by the uncertainty of 
the stochastic neural dynamics. The effect of increasing 
the memory length O is obvious: The critical point be- 
tween the two regimes is shifted to lower values of (3, i. e., 
higher noise. Synapses with larger memory can average 
out the uncertainty and therefore enable stochastically 
firing networks to adapt to their environment. 

Now let us briefly discuss the biological motivations 
for the choice of mechanisms used in the model above. 
First, observations in experimental neurobiology show 
clear evidence that modulation of long-term potentiation 
(LTP) and depression (LTD) via external signals occurs 
(i.e., modulation of plasticity of weights). In one exam- 
ple from the hippocampus CA1 region, which is involved 
in learning and memory formation, modulation mediated 
by dopamine has been verified In particular, when 
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dopamine is applied during or shortly after LTD activ- 
ity, one observes that LTD is suppressed (and LTP can 
appear instead). Learning activity can thereby receive 
feedback via dopamine which then modulates synaptic 
plasticity, in particular LTD. Indeed, hormone signals are 
widely known to interfere with learning and memory for- 
mation. For example adrenal hormones have been shown 
to enhance susceptibility for LTD p2[ , an effect which 
has even been found following behavioral stress in living 
animals |2^|. A broad class of other factors that mod- 
ulate synaptic plasticity have been classified, sometimes 
summarized as "metaplasticity" p4| . We believe that 
further research in this area will provide new insights in 
the computational mechanics of biological nervous sys- 
tems. 

Furthermore, progress has been made in exploring the 
mechanisms of retrograde feedback in LTP and LTD. Ev- 
idence accumulates in favor of some physiological mech- 
anisms that feed back the postsynaptic activity to the 
presynaptic site. A possible mechanism recently pro- 
posed for LTD is the messenger nitric oxide evoking a 
specific presynaptic biochemical cascade which, eventu- 
ally, interacts with the intracellular mechanisms for vesi- 
cle formation and loading ||5| . The subsequently reduced 
transmitter release establishes a long term depression of 
this synaptic pathway. An interesting observation is the 
long time scale of this process of the order of 15 minutes 
p5[ , especially when compared to that of neuronal firing 
packages. This opens up the possibility that consider- 
able time averaging may occur in the course of inducing 
LTD. The effect of such a synaptic averaging on learning 
has been simulated above by an internal counter asso- 
ciated with each synaptic weight. Further experimental 
research on the timing of externally induced LTD and 
the lifetimes of the biochemical agents involved in the 
retrograde signaling cascade may show to what extent 
synaptic averaging in the induction of plasticity changes 
is established in nature. 

To summarize, we studied a biologically motivated 
model for goal-directed learning in multilayer neural net- 
works. In contrast to existing models, synaptic plastic- 
ity is based on a time-averaged individual failure rate of 
each synapse. Thereby, learning of general logical func- 
tions (including XOR) is possible on the basis of local 
synaptic plasticity alone, combined with homogeneous 
failure feedback. In particular, no error backpropagation 
is needed. The presented algorithm also works in the 
presence of noise, where internal errors are compensated 
for by the averaging of each synapse: only persistent fail- 
ure is punished. 
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